Data Warehouse & Data Lake

Designing Machine Learning Systems

  • A repository for storing structured data is called a data warehouse.
  • A repository for storing unstructured data is called a data lake.

What is Data Warehouse

Database vs. Data Warehouse

Database Data warehouse (Data Warehouse & Data Lake#What is Data Warehouse)
designed for transactions designed for analytics & reporting
fresh & detailed refreshed periodically and is summarised
slow for querying large data generally faster (do not interfere with any process)
The process from database to data warehouse is ETL.

What is Data Lake

Data lake is a centralized database system that stores large amounts of raw data in its original format until it’s needed.

Data Warehouse vs. Data Lake

Data warehouse Data lake
Data has already been processed and stored in a relational system Data is raw and unprocessed until it is needed for analysis; additionally, it can have a copy of the entire OLTP or relational database
The data’s purpose has already been assigned, and the data is currently in use The data’s purpose has not been determined yet
Making changes to the system can be complicated and require a lot of work Systems are highly accessible and easy to update

What is Data mart